Skip to content

Fix Codex cost scanner overcounting and cross-day undercounting#680

Open
xx205 wants to merge 1 commit intosteipete:mainfrom
xx205:codex/fix-codex-cost-scanner-fork-crossday
Open

Fix Codex cost scanner overcounting and cross-day undercounting#680
xx205 wants to merge 1 commit intosteipete:mainfrom
xx205:codex/fix-codex-cost-scanner-fork-crossday

Conversation

@xx205
Copy link
Copy Markdown

@xx205 xx205 commented Apr 10, 2026

Summary

Fix two issues in the local Codex cost scanner:

  1. Forked child sessions could replay the parent session's cumulative total_token_usage, causing severe overcounting.
  2. Long-lived sessions could be missed when their file lived under an older date-partition directory but their token_count.timestamp fell inside the report window, causing undercounting.

This also hardens fork baseline selection by comparing parsed timestamps instead of relying only on lexical string ordering.

Root Cause

  • The scanner previously treated each session file independently. For forked Codex sessions, child logs may start by replaying the parent's cumulative token history, so inherited usage was counted again as child usage.
  • The scanner only enumerated session files from directories near the requested day window. That misses long-lived sessions stored under older path partitions even when the file contains in-window token_count.timestamp events.

Changes

  • Parse and store forked_from_id for Codex session files.
  • Resolve parent sessions by parsed session_meta.id.
  • Parse parent cumulative token snapshots and subtract the inherited baseline at the child fork timestamp.
  • Compare parent snapshot timestamps and fork timestamps as parsed Date values, with warning + lexical fallback on parse failure.
  • Include older-partition Codex session files by recursively discovering *.jsonl files under the Codex sessions root during refresh.
  • Keep recursive discovery out of warm-cache reads.
  • Bump the Codex cache artifact version and store forkedFromId.

Tests Added

  • Long-lived session stored under an older date partition is included in the daily report.
  • Forked child subtracts parent totals at the fork timestamp.
  • Forked child ignores replayed parent prefix sequences.
  • Forked child resolves the parent when the parent session file is a symlink.
  • Forked child resolves the correct parent by exact session_meta.id.
  • Forked child compares parent snapshots by parsed timestamp when UTC and offset timestamp formats are mixed.

Validation

Passed locally:

  • swift test --filter CostUsageScannerBreakdownTests
  • swift test --filter CostUsageFetcherTests
  • ./Scripts/compile_and_run.sh
  • git diff --check

Real-data validation on local .codex data, Asia/Shanghai timezone:

  • Skill reference result for local day 2026-03-11: 166,904,086 tokens
  • Patched CodexBar result for local day 2026-03-11: 166,904,086 tokens

Before the fix, the same day was incorrectly reported around 8.4B tokens because forked child session replay dominated the result. Separately, upstream discovery could omit cross-day sessions stored under old path partitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant